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Abstract 

The duck {Anas platyrhynchos) is one of the principal natural hosts of influenza A viruses. We 
present the duck genome sequence and perform deep transcriptome analyses to investigate 
immune-related genes. Our data indicate that the duck possesses a contractive immune gene 
repertoire, as in chicken and zebra finch, and this repertoire has been shaped through lineage- 
specific duplications. We identify genes that are responsive to influenza A viruses using the lung 
transcriptomes of control ducks and ones that were infected with either a highly pathogenic (A/ 
duck/Hubei/49/05) or a weakly pathogenic (A/goose/Hubei/65/05) H5N1 virus. Further, we show 
how the duck's defense mechanisms against influenza infection have been optimized through the 
diversification of its )3-defensin and butyrophilin-like repertoires. These analyses, in combination 
with the genomic and transcriptomic data, provide a resource for characterizing the interaction 
between host and influenza viruses. 



Birds are distinct from other vertebrates in many characteristics, both morphological (for 
example, feathers and eggs) and physiological (for example, their lightweight skeletons and 
high metabolic rates). Birds occupy habitats from the Arctic to the Antarctic, and they have 
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body size ranging from 5 cm to 2.8 m. The diversity found in birds inspires us to investigate 
the genomic differences underlying their phenotypic differences from mammals and the 
variation within the avian class. The sequencing of the chicken genome 1 provided the first 
insight into the evolutionary events between birds and other vertebrates. Avian evolutionary 
events have subsequently been elucidated with the recent availability of the zebra finch and 
turkey genomes 2 ' 3 . Additional avian genomes, however, are needed to provide more detailed 
evolutionary information and insight into adaptation mechanisms. The duck (A. 
platyrhynchos) is particularly well suited for further exploration in these areas. Ducks 
diverged from the related chicken and turkey and zebra finch approximately 90-100 million 
years ago 4 . The duck is also one of the most economically important waterfowl as a source 
of meat, eggs and feathers. 

Of special interest to medicine and agriculture is the fact that ducks serve as the principal 
natural reservoir for influenza A viruses and harbor all 16 hemagglutinin (HA) and 9 
neuraminidase (NA) subtypes that are currently known 5,6 , with the exception of the H13 and 
H16 subtypes 7 . Often, influenza strains cause the duck no harm. However, the long-standing 
equilibrium between influenza A viruses and the duck has been disrupted with the 
emergence of H5N1 viruses 8 . H5N1 strains have caused unprecedented outbreaks in poultry 
in more than 60 countries and have caused 622 human infections (as of March 2013), with 
an overall fatality rate of 59% in humans. Recently, it was reported that an engineered 
influenza virus encoding hemagglutinin from the highly pathogenic avian H5N1 influenza A 
strains can be transmitted between ferrets 9 , emphasizing the potential for a human pandemic 
to emerge from birds. Furthermore, weakly pathogenic avian influenza A viruses, such as 
H9N2 (ref. 10) and H7N2 (ref. 1 1) strains, have caused transient infections in humans 12 . 
The exceptional virulence of avian influenza viruses in humans, in combination with their 
ongoing evolution in birds, motivates us to better understand host immune responses to 
avian influenza viruses. We report here the duck whole-genome sequence and compared it 
to the genomes of mammals and other birds. We performed deep transcriptome analyses of 
lungs from control ducks and ones that were infected with either a highly pathogenic (A/ 
duck/Hubei/49/05, DK/49) or a weakly pathogenic (A/goose/Hubei/65/05, GS/65) H5N1 
virus". 

RESULTS 

The genome landscape 

We sequenced the genome of a 10-week-old female Beijing duck using methods similar to 
those applied to sequence the giant panda genome 14 . In total, we generated 77 Gb of paired- 
end reads (approximately 64-fold coverage of the whole genome) with an average length of 
50 bp (Supplementary Figs. 1-3 and Supplementary Table 1). Using SOAPdenovo 
(Supplementary Note), we combined short reads to generate a draft assembly, which 
consisted of 78,487 scaffolds and covered 1.1 Gb. The contig N50 and scaffold N50 values 
of this draft assembly were 26 kb and 1 .2 Mb, respectively (Supplementary Table 2). We 
then constructed superscaffolds and created chromosomal sequences according to the duck 
genetic map 15 and the comparative physical map 16 . This effort resulted in the construction 
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of a total of 47 superscaffolds, which contained 225 scaffolds and spanned 289 Mb 
(Supplementary Table 3). 

We generated transcriptomes from several different tissues (Supplementary Note). These 
transcriptomes comprised 1.87 million ESTs and approximately 121 million 75-bp and 
approximately 917 million 90-bp paired-end reads, which were generated using either the 
454/Roche Life Sciences Analyzer or Illumina Genome sequencing technology (Table 1 and 
Supplementary Tables 4 and 5). Next, we estimated the coverage of the duck assembly with 
alignments to seven finished BACs (completed independently using Sanger sequencing 
technology 17 ), 240 microsatellite markers 15 and the 319,996 ESTs assembled in this project 
(Supplementary Note) using BLASTN (E value < 1 x 10 -5 ). These analyses suggested that 7 
BACs covering 640 kb on chromosomes 1, 3 and 4 were aligned over more than 95% of 
their lengths (Supplementary Fig. 4). Similarly, greater than 97% of 240 microsatellite 
markers and 96% of the 319,996 ESTs were aligned to the duck assembly. We further 
aligned the duck and chicken assemblies to the human using Narcisse. This effort showed 
that the coverage of the two avian assemblies of the human genome (GRCh37) was similar, 
indicating that the quality of the duck and chicken assemblies was comparable. In addition, 
in aligning the duck assembly onto the chicken genome sequence using GLINT software 
with its default parameters 18 , only 41 of the 78,847 duck scaffolds were identified as 
possible chimeras when applying the criterion that they contain sequences that partially map 
to two different chicken chromosomes. These data indicate that the duck assembly has good 
coverage and generally provides a reasonable substrate for the analysis presented in this 
study. 

We defined reference gene sets from the duck assembly using the BGI and Ensembl 
pipelines (Supplementary Note). For the BGI reference set, we aligned the duck 
transcriptome data and human (22,232) and chicken (16,701) genes to the duck genome 
assembly, yielding 15,065 predicted protein-coding duck genes. We applied GENSCAN 19 
and Augustus 20 with the default parameters and predicted 32,383 and 22,739 protein-coding 
genes in the duck draft genome for the respective reference sets (Supplementary Table 6). 
Finally, we integrated all gene sources and created a reference set containing 19,144 genes, 
constituting approximately 2.3% of the total duck assembly. Of these 19,144 genes, 9,678 
were mapped to categories established by the Gene Ontology (GO) Project, 14,725 had 
orthologs in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database, and 18,012 
were supported by the duck ESTs (Supplementary Figs. 5 and 6). Comparing the BGI 
reference gene set to the Ensembl reference gene set, which contained 15,634 protein-coding 
genes (Supplementary Note), we found that 14,049 of the BGI reference genes and 13,81 1 
of the Ensembl reference genes were predicted by both pipelines, whereas the remaining 
5,095 and 1,823 genes from the two sets, respectively, were annotated with the BGI and 
Ensembl pipelines alone. We predicted 817 noncoding RNA genes in the duck assembly 
with a homology-based annotation method (Supplementary Tables 7 and 8 and 
Supplementary Note). In addition, we identified 29 families of DNA transposons, 61 
families of retrotransposons and 414 families of microsatellites, comprising 5.9% of the 
duck genome (Supplementary Table 9 and Supplementary Note). 
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Changes in gene family size 

We examined large-scale differences in gene complements within birds and between birds 
and either mammals or fish using the duck reference gene set and combined gene sets from 
three birds, one reptile-amphibian, three fish and eight mammals (Fig. 1). Using a likelihood 
model 21 , we estimated at least 23,044 genes distributed in 14,466 gene families in the most 
recent common ancestor (MRCA) of 17 vertebrates (Fig. 1). We estimated the average rate 
of change across all 14,466 gene families in the MRCA via maximum likelihood under a 
single- or multiple-rate model for each clade 21 . A comparison of likelihood values showed 
that the best-fit model (P < 0.01) estimated the average rates of genomic turnover 
(represented by the X value, given per gene per million years) to be 0.0012 for teleosts, 
0.0019 for Xenopus tropicalis, 0.0011 for reptiles (including birds) and 0.0017 for 
mammals. These estimates are similar to previous values for yeast (0.0020) 21 , Drosophila 
melanogaster (0.0023) 22 and mammals (0.0017) 23 , supporting the theory that rates of gene 
duplication and deletion across eukaryotes are comparable. 

We compared gene family sizes at parent and daughter nodes along the vertebrate tree (Fig. 
1), finding that the total numbers of contractions outnumbered expansions in the MRCAs of 
teleosts, mammals and reptiles. This tendency, where contractions outnumbered expansions, 
was continued into the duck and turkey. However, the reverse case, where expansions 
outnumbered contractions, seemed to be true in the chicken and zebra finch. From the 
terminal branch leading to the duck and chicken, we inferred a gain of 562 out of 1,029 
genes and a loss of 1,423 out of 1,249 genes in -90 million years (Fig. 1). 

Expansion and contraction of immune gene families 

We identified the predicted duck, chicken and zebra finch genes that were homologs of 
3,542 human and 1,415 mouse immune genes (comprising 4,344 unique immune genes), 
which were derived from analyses of Import, IRIS, Septic Shock Group, MAPK-NF-kB 
network and immunome databases using TreeFam 24 . In total, 6,044 human and 5,715 mouse 
genes were clustered into 3,726 immune-related gene families, all of which included at least 
one of the 4,344 unique immune genes. However, only 3,116 duck genes, 3,294 chicken 
genes and 3,355 zebra finch genes were clustered into the immune-related gene families, 
indicating that avian immune gene repertoires were contractive. 

We used cytokines as an example to compare immune gene repertoires in mammals and 
birds. After detecting cytokines in the above five species with TreeFam 24 , we manually 
queried their repertoires against the non-redundant database in NCBI and examined their 
assemblies in Ensembl (version 57). Using the combined duck transcriptome and assembly 
data, we identified 150 duck cytokine genes; although this number resembles the numbers of 
such genes identified in chicken (149 genes) and zebra finch (150 genes), it is substantially 
lower than the numbers of mammalian cytokine genes (230 in humans and 218 in mice) 
(Table 2). 

We found that the duck genome contains 16 defensins distributed over 3 scaffolds, a number 
that was slightly higher than that of the 14 defensins found in chicken 25 . Closing the 
sequence gaps within the three scaffolds via Sanger sequencing, we identified three 
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additional duck defensin genes (including one pseudogene). Structural and phylogenetic 
analyses of avian defensins and mammalian fi-defensins showed that all duck defensins are 
(3-defensins (Supplementary Fig. 7a), supporting the hypothesis that birds lack a- and 6- 
defensins 25,26 . Molecular phylogenetic analysis indicated that the avian defensin genes were 
divided into 12 subfamilies: 1 subfamily (AvDB14) lost its member in the zebra finch, 2 
subfamilies (AvDBl-AvDB3 and AvDB6-AvDB7) contained lineage-specific duplications 
(LSDs), and 9 subfamilies (AvDB2, AvDB4, AvDB5 and AvDB8-AvDB13) remained one-to- 
one orthologs in three birds (Supplementary Fig. 7a). Evolutionary comparison of the avian 
defensin genes suggested that single clusters of these genes in duck, chicken and zebra finch 
were collinear, with the exception of AvDBl, AvDB3 and AvDB14 (Supplementary Fig. 7b). 
These observations suggest that the ancient Neognathae had 13 avian defensin genes, 
including AvDBl-AvDB5 and AvDB7-AvDB14. Gene duplications along with the 
pseudogenization of defensin genes further increased the repertoire of these genes in both 
duck and zebra finch. Two gene duplication events of AvDBl have been described in the 
zebra finch; however, duplication of the ancient AvDB7 gene seems to have led to the 
introduction of the AvDB6 gene in chicken. 

LSDs in birds 

Using a cutoff of 2 duplication events, we identified 5, 76, 577 and 1,752 LSDs in turkey, 
duck, chicken and zebra finch, respectively (Supplementary Note). Of the 14 gene families 
that contained 76 duck LSDs, we found that 3 were significantly expanded in this lineage 
(family-wide P value < 0.0005). One family is a BTNL (butyrophilin-like) family, which 
includes the mammalian BTNL genes, with the exception of BTNL9 (Supplementary Fig. 
7c). Domain prediction using SMART software 27 suggested that 6 out of 17 duck gene 
fragments encoded a structure that was typical of BTNLs 28 . The high prevalence of these 
genes in duck is in sharp contrast to their frequency in chicken and turkey, where only 1 or 2 
out of 4 genes encoded this structure (Supplementary Fig. 7d). The second family 
significantly expanded in the duck lineage was an olfactory receptor gene family, and the 
third family was a novel gene family that included only five duck epidermal growth factor 
(EGF)-like genes (Supplementary Table 10). 

Evidence for positive selection 

We performed likelihood ratio tests (LRTs) for positive selection with the codeml program 
under a branch-site model 29 using 8,409 avian quaternions (Supplementary Note). These 
LRTs (false discovery rate (FDR) <0.05) predicted that 2.7%, 5.2%, 6.2% and 10.9% of 
genes showed evidence of positive selection in chicken, duck, zebra finch and turkey, 
respectively. These proportions are significantly lower than the previously reported values 
(10.7% in chicken and 1 1.3% in zebra finch) 30 . This large variation in the proportion of 
genes showing evidence of positive selection within a particular lineage is partly attributed 
to alignment problems and poor sequence quality 31,32 . A comparison across species 
suggested that the proportion of positively selected genes in duck was in the range of the 
proportions detected in high-quality genomes, such as those of the chicken (2.7%) and zebra 
finch (6.2%). This observation, along with reports from the assessment of the quality of the 
duck draft genome (Supplementary Note), encouraged us to further investigate the 
biological functions of positively selected genes in the duck. Ingenuity Systems Pathway 
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Analysis (IPA) showed that positively selected genes in duck were enriched in cellular 
assembly and organization, cellular function and maintenance, and cell signaling. In 
addition, genes related to amino acid metabolism and small-molecule biochemistry tended to 
be under positive selection in duck (Supplementary Table 11). 

Gene profiles after avian influenza virus infections 

We examined global gene expression profiles using seven lung trans criptomes of ducks 
infected with H5N1 viruses and control individuals (Table 1). Alignment of approximately 
916 million Illumina paired-end reads with the merged reference gene set suggested that 
between 17,951 and 18,276 genes were expressed in these lung tissues, and 16,404 genes 
were transcribed in all 7 lung tissues. In general, the overall gene expression patterns of 
DK/49-infected animals 1-2 d after inoculation were similar to those of GS/65 -infected 
animals 1 d after inoculation, whereas the gene expression profiles of DK/49-infected 
animals 3 d after inoculation were similar to those of GS/65 -infected animals 2-3 d after 
inoculation (Fig. 2a). Compared to control animals, DK/49-infected ducks had 2,257, 3,101 
and 3,066 genes with significantly altered expression (FDR <0.001, fold change >2) 1-3 d 
after inoculation, and GS/65 -infected ducks had 916, 2,060 and 1,251 genes with 
significantly altered expression (FDR <0.001, fold change >2) 1-3 d after inoculation (Table 
1). These findings, together with hierarchical clustering analysis, showed an appreciably 
more dynamic response during infection with GS/65 virus compared to DK/49 virus. Further 
comparison suggested that 1,506, 1,436 and 1,396 genes had significantly different 
expression levels (FDR <0.001, fold change >2) in DK/49-infected ducks and GS/65- 
infected ducks 1-3 d after inoculation, respectively (Table 1 and Supplementary Fig. 8). 
Compared to the duck reference gene set (with 5.2% predicted to be under positive 
selection), these sets of differentially expressed genes were predicted to have a slightly 
higher proportion of genes (5.4-7.0%) under positive selection (Table 1), supporting the 
hypothesis that ducks might alter their sensitivity to avian influenza viruses through the 
positive selection of many genes evolving in the host response to these viruses 33,34 . 

We merged the sets of genes determined to have differential expression in DK/49- or GS/65- 
infected ducks compared with control animals into set 1 (5,038) and set 2 (2,741), 
respectively, and we combined the genes with differential expression in DK/49- and GS/65- 
infected ducks into set 3 (3,232) (Table 1). IPA of sets 1 and 2 identified four enriched 
categories related to cell activation and one enriched category associated with antigen 
presentation (P < 0.0005; Supplementary Table 12), suggesting that ducks infected with 
DK/49 or GS/65 virus both have severe disruption of cellular functions. IPA of set 3 also 
identified three enriched categories related to cell activation (cellular movement, cellular 
growth and proliferation, and cellular development), as well as one associated with 
molecular transport and one related to lipid metabolism (Supplementary Table 12). 

Innate immune response in avian influenza virus infections 

Transcriptome analysis of 150 cytokines (listed in Table 2) showed that, compared to 
control ducks, ones infected with DK/49 or GS/65 had 74 cytokines with expression levels 
that were significantly changed (FDR < 0.001, fold change > 2) 1-3 d after inoculation (Fig. 
2b; full gene names are given in Supplementary Table 13). Of these cytokines, 20 growth 
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factor genes (BMP1-BMP5, EGF, FGF9, FGF12, FGF13, GDF10, GDF11, HGF, IGF1, 
INHBA, NGFB, NRG3, PDGFD, PGF, TGFB2 and TGFB3) had expression that was 
significantly decreased by 2.0- to 9.8-fold, and 13 growth factor genes (BMP8, EFNA1, 
FGF8, FGF18, FGF23, GDF9, INHBB, INHBC, KITLG, MSTN, NODAL, NRG2 and 
VEGFC) had expression that was significantly increased by 2.1- to 379-fold with DK/49 or 
GS/65 infection. Similarly, one tumor necrosis factor (TNF) gene (TNFSF1 1), one 
interleukin (IL)-17 gene (IL17D) and three CXC chemokine genes (CXCL12, CXCL13L2 
and CXCL14) had expression that was markedly decreased by 2.2- to 5.8-fold, whereas one 
IL-17 gene (IL17A), three TNF genes (TNFSF4, TNFSF6 and TNFSF10) and four CXC 
chemokine genes (CX3CL1, CXCL13, IL8A and IL8B) had expression that was markedly 
increased by 2.3- to 34-fold after infection with DK/49 and GS/65. However, all five 
interferon genes (IFNA, IFNE, IFNG, IFNK and IL28A), one C chemokine gene (XL1), nine 
CC chemokine genes (CCL4L2, CCL5, CCL6, CCL17, CCL19, CCL20, CCL21, CCL23 and 
CCL24) and ten interleukin or interleukin receptor genes (IL1, IL6, IL10, IL13, IL12A, 
IL12B, IL19, IL22, LEP and LIF) had expression that was markedly upregulated by 2.1- to 
1,414-fold in DK/49- or GS/65-infected ducks compared with control ducks. Compared with 
GS/65-infected ducks, DK/49-infected ducks had 7 cytokine genes with expression that was 
elevated by 2.4- to 46-fold 1-2 d after inoculation that then had expression decreased by 2.3- 
to 7-fold 2-3 d after inoculation, 12 cytokine genes with expression that was significantly 
downregulated (by 2.0- to 4.2-fold) and 38 cytokine genes with expression that were 
significantly upregulated (by 2.0- to 1,414-fold) 1-3 d after inoculation (FDR <0.001, fold 
change >2; Fig. 2b). 

DDX58, IFITM3 and IFIT1-IFIT3 have key roles in the antiviral response to avian influenza 
virus infection 35-37 in mammals. Transcriptome analysis showed that the expression levels 
of the DDX58, IFITM3 and AvIFIT genes (the gene name of AvIFIT is given in 
Supplementary Fig. 9) in both DK/49- and GS/65-infected ducks were markedly increased 
by 6.9- to 440-fold 1-3 d after inoculation, with peak expression (increased by 12.5- to 440- 
fold) at 2 d, compared with control ducks (Fig. 2b). Similar to DDX58, two additional RNA 
helicases (ADAR and DHX58) showed expression that was significantly elevated by 2.1- to 
30-fold 1-3 d after inoculation, with peak expression (increased by 3.2- to 30-fold) at 2 d, in 
both DK/49- and GS/65-infected ducks compared with control ducks, indicating that these 
three RNA helicases have key roles in the host response to avian influenza viruses in duck. 
Moreover, similar to AvIFIT, which had altered gene expression in infected ducks, the genes 
for three additional interferon-induced proteins (IFIH1, IFITM5 and IFITM10) showed 
significantly different expression levels during infection with DK/49 virus 1-3 d after 
inoculation, with peak expression at 2 d. Pronounced changes in gene expression for IFIH1, 
IFITM5 and IFITM10 were observed in the GS/65-infected ducks 2 d after inoculation, 
whereas only minor changes in IFITM5 and IFITM10 expression were detected in GS/65- 
infected ducks at 1 and 3 d after inoculation (Fig. 2b). Toll-like receptors (TLRs) are 
involved in host-virus interactions and lead to the secretion of interferons by infected cells. 
Consistent with the changes observed in the expression of the five interferon genes, nine 
TLR genes, including two members not found in mammals (TLR15 and TLR21), had 
expression that was significantly increased by 2.0- to 7.5-fold in DK/49- or GS/65 -infected 
ducks 1-3 d after inoculation compared with control ducks (Fig. 2b). However, the 
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immunoglobulin M (IgM) locus, three T cell receptors genes (TRA, TRD and TRG) and four 
genes encoding CD molecules (CD3E, CD4, CD8A and CD40LG) showed significantly 
decreased expression in DK/49- or GS/65-infected ducks compared with control ducks 
(FDR <0.001, fold change >2). In addition, three major histocompatibility complex (MHC) 
genes (Anpl-DRA, TAP1 and TAP2) and four colony-stimulation factor receptor genes 
(CSF2RA, CSF2RBA, CSF2RBB and CSF3R) had significantly elevated expression in 
DK/49- or GS/65-infected ducks compared with control ducks (FDR <0.001, fold change 
>2; Fig. 2b). 

LSDs of immune genes responsive to avian influenza viruses 

Mammalian defensins have been proposed to contribute to the immune response to avian 
influenza virus infection. In mice, the expression of P-defensins 3 and 4 is significantly 
higher in avian influenza virus-infected airways 38 , and, in humans, P-defensins inhibit avian 
influenza virus replication and increase the uptake of these viruses by neutrophils 39 . 
Similarly, key roles for BTNLs in immune responses have been extensively reported in 
mammals: four BTNLs (BTNL1, BTNL2, BTNL4 and BTNLS) can attenuate T cell activation 
and antagonize pathological inflammatory T cell infiltrates 28,40 . However, the functions of 
avian defensins and BTNLs in immune responses to influenza viruses in birds are uncertain. 

Transcriptome analysis indicated that eight avian defensin genes (AvDB2, AvDB3C, 
AvDB3D, AvDB4, AvDB5, AvDB7, AvDB8 and AvDB9), including two LSDs of these genes 
(AvDB3C and AvDB3D), had expression that was markedly increased by 7.6- to 1,551 -fold 
(FDR <0.001, fold change >2) in DK/49- or GS/65-infected ducks compared with controls. 
Unexpectedly, AvDBlO and AvDB13 showed expression that was significantly elevated by 
3.9- to 5.1-fold 1 d after inoculation that returned to normal, basal levels at 2 d and was 
downregulated by 8.6- to 12-fold at 3 d in DK/49-infected ducks. However, such 
pronounced changes in AvDBlO and AvDBl 3 expression were not detected 1-3 d after 
inoculation in GS/65-infected ducks, where significant change in gene expression was only 
observed for AvDB13 2 d after inoculation (Fig. 2c). Compared with GS/65-infected ducks, 
DK/49-infected ducks had three avian defensin genes {AvDB2, AvDB4 and AvDB9) with 
expression that was significantly increased by 3.1- to 971 -fold 1 d after inoculation, and the 
reverse was true (with expression significantly decreased by 3.1- to 971 -fold) 2 d after 
inoculation (Fig. 2c). Moreover, two avian defensin genes {AvDB3D and AvDBS) had 
expression that was significantly elevated by 4.1- to 8.9-fold 1 d after inoculation, and one 
defensing gene (AvDBT) had expression that was significantly decreased by 9.8-fold at 2 d 
in DK/49-infected ducks compared with GS/65 -infected ducks (Fig. 2c). Of the 17 BTNL 
genes, 1 1 genes, including 8 LSDs, showed expression that was markedly elevated by 2.0- 
to 7.5-fold 1-3 d after inoculation in DK/49- or GS/65-infected ducks compared with 
control individuals. In comparison with GS/65-infected ducks, DK/49-infected ducks had 
one BTNL gene with expression that was markedly decreased by 4.5-fold and five BTNL 
genes, including three LSDs, with expression that was substantially increased by 2.1- to 3.2- 
fold (Fig. 2c). 
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DISCUSSION 

We draw four noteworthy conclusions from our results. First, using next-generation 
sequencing technology, we generated the first draft sequence for a waterfowl, one which is a 
natural host of avian influenza viruses. Second, we showed that immune-related gene 
repertoires in three bird lineages (including 3,1 16-3,355 genes) seemed to be contractive 
compared to equivalent repertories in mammals (with 5,715-6,044 genes). Further efforts 
identified about 150 cytokines in three bird species using assemblies in Ensembl57 and the 
NCBI non-redundant database (as of November 2012), whereas there were about 220 
cytokines in 2 mammal species (Table 2). Third, we performed deep transcriptome analysis 
to characterize gene expression profiles and to identify genes that are responsive to avian 
influence viruses (for example, AvIFT, AvDB7-AvDB10, IFITM5 and IFITM10). Fourth, we 
found that some LSDs of avian defensin and BTNL genes might be involved in host immune 
response to both of the two H5N1 viruses whose infection was examined in ducks. 

The duck genome possesses a contractive immune -related gene repertoire similar to those of 
the chicken and zebra finch, and it includes genes that are not present in the other three 
species whose genomes have been sequenced (Fig. 1). In the analyses presented here, we 
found that many genes (for example, BTNLs and defensins) were independently duplicated 
in the duck but not in the chicken genome. These results suggest that gene gain and loss 
have influenced the divergence of the four avian genomes and the evolution of their 
respective immune systems. |3-defensins are induced in response to influenza virus 
infection 38 , and BTNLs are involved in T cell activation and infiltration in mammals 28,40 . 
Notably, most of the defensin and BTNL genes, including two LSDs of AvDB3 and eight 
LSDs of BTNL genes, may be implicated in the host immune response to influenza in duck; 
therefore, a functional analysis of the defensins and BTNLs of birds will be of interest to the 
study of avian influenza virus infection. Moreover, the duck seems to benefit from positive 
selection on specific genes functioning in the host-virus interaction. This presence of this 
benefit is supported by the detection of slightly higher frequencies of positively selected 
genes in the sets of differentially expressed genes identified following infection with H5N1 
viruses compared to the duck reference gene set reported in this study. The protein 
sequences of the two H5N1 viruses investigated in this study are highly conserved, with the 
exception of 20 amino acid alterations distributed over 7 genes; however, one virus is highly 
pathogenic (DK/49), and the other is weakly pathogenic (GS/65) in ducks 13 . Notably, the 
optimized immune system of ducks can be overcome by the highly pathogenic H5N1 virus 
but not by the weakly pathogenic H5N1 virus. This distinction identifies disruptions in the 
long-standing equilibrium between ducks and avian influenza viruses. Our future ability to 
assess the functions of genes showing significantly different expression induced by highly 
pathogenic H5N1 viruses compared with weakly pathogenic H5N1 viruses, using genetic 
manipulations and co-evolutionary analyses, will certainly extend knowledge of the avian 
genes related to influenza in birds. 
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ONLINE METHODS 

Sequence assembly 

We constructed paired-end DNA libraries with insert size smaller and larger than 2 kb 
according to the manuals for the standard DNA and mate -pair libraries, respectively 
(Illumina). The sequencing process followed the manufacturer's recommendations. After 
removing duplicate reads introduced by PCR, base calling and adaptor sequence contained 
in the raw reads, we assembled the duck genome with SOAPdenovo 14 . 

Gene evolution 

We created a reference gene set by merging the homology set, the de novo set and the 
GLEAN set. We built gene families using the Ensembl pipeline 41 with genomes in 
Ensembl59 in addition to the duck and turkey genomes, and we subsequently estimated the 
change of gene complements using the CAFE (computational analysis of gene family 
evolution) tool 21 . We predicted the positively selected genes using the codeml program 
under the branch-site model 29 . We constructed maximum-likelihood trees with protein 
sequences using PHYML version 2.4.4 under the JTT model with four substitution rate 
classes 42 . 



Facility 

Studies of the H5N1 viruses (DK/49 and GS/65) were conducted in a biosecurity level 3+ 
laboratory that was approved by the Chinese Ministry of Agriculture. All animal studies 
were approved by the Review Board of the Harbin Veterinary Research Institute, Chinese 
Academy of Agriculture Sciences. 

Viruses, duck infection and RNA extraction 

The DK/49 and GS/65 H5N1 viruses were isolated from a duck and a goose, respectively, 
during the avian influenza outbreak of 2005 in China 43 . The viruses were propagated in 10- 
d-old fertilized chicken eggs. Two groups of 4-week-old specific pathogen-free (SPF) 
Shaoxin ducks from the Harbin Veterinary Research Institute, China Academy of 
Agricultural Science, were inoculated intranasally with 10 3 of 50% egg infectious doses 
(EID50) of the DK/49 and GS/65 viruses after adapted to a biosecurity level 3+ environment 
for 4 d. The lungs were collected from the above H5N1 virus-infected ducks at days 1, 2 
and 3 after inoculation and from uninfected 4-week-old SPF ducks (n = 3, except n = 2 for 
the GS/65-infected ducks 2 d after inoculation, and n = 1 for the DK/49-infected ducks 3 d 
after inoculation). 

Total RNA was extracted from approximately 100 mg of each lung tissue sample using the 
Qiagen RNeasy kit. RNA concentration and quality were measured using an Agilent 2100 
Bioanalyzer, showing that all RNA samples had an RNA integrity number (RIN) of >7.3 
and a ratio of 28S:18S rRNA of >1.0. Subsequently, each pooled lung RNA sample for 
DK749-infected ducks 1-2 d after inoculation, GS/65-infected ducks 1-3 d after inoculation 
and control ducks was separately prepared from equal masses of two or three individuals. 
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RNA sequencing 

cDNA libraries were prepared according to the manufacturer's instructions (Illumina). One 
mRNA sample of the lung collected from a duck infected with the DK/49 virus 3 d after 
inoculation and six pooled lung RNA samples were purified from total RNA using Dynal 
Oligo(dT) beads and fragmented into small pieces of approximately 200 nt using RNA 
Fragmentation Reagents (Ambion). Cleaved mRNA fragments were reverse transcribed into 
single cDNAs using Superscript II (Invitrogen) and were primed with random primers; 
double-stranded cDNA was then synthesized using RNase H (Invitrogen) and DNA Pol I 
(Invitrogen). Subsequently, cDNA was subjected to end repair and phosphorylation using 
Klenow polymerase (Enzymatics), T4 DNA polymerase (Enzymatics) and T4 
polynucleotide kinase (to blunt end the DNA fragments) (Enzymatics). End-repaired cDNA 
fragments were 3' adenylated using Klenow (exo-) DNA polymerase (Enzymatics). Then, 
Illumina paired-end adapters were ligated to the ends of these 3'-adenylated cDNA 
fragments. Gel electrophoresis was used to separate the cDNA fragments from any unligated 
adapters. cDNA fragments from 180-220 bp in size were selected. cDNA libraries were 
amplified using 12 cycles of PCR with Phusion polymerase (NEB), and 75-cycle paired-end 
sequencing was performed on the Illumina Genome Analyzer. 

Transcriptome analysis 

We clustered the BGI and Ensembl reference gene sets (Supplementary Note) and the duck 
transcripts deposited in the NCBI database to create a merged reference gene set consisting 
of 20,647 protein-coding genes. We aligned the high-quality reads to the genome and the 
merged reference gene set using SOAPaligner with a threshold of five mismatches. For 
multiposition hits, one of the best matching loci was chosen randomly. Only uniquely 
mapped reads were used for the analysis of gene expression levels. Differentially expressed 
genes were identified using Fisher's exact test 44 and Bonferroni-Hochberg correction (FDR 
<0.001, fold change >2) 45 . Differentially expressed genes identified in DK/49- or GS/65- 
infected ducks compared with control ducks were merged to create sets 1 and 2. 
Differentially expressed genes identified in DK/49-infected ducks compared with GS/65- 
infected ducks were combined into set 3 (Table 1). The three sets of differentially expressed 
genes were used to investigate biological processes using IPA. 

Supplementary Material 

Refer to Web version on PubMed Central for supplementary material. 
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Figure 1. 

Numbers of gene losses and gains across 17 vertebrates. Data are shown for 17 vertebrates, 
3 teleosts, 5 reptilians and 8 mammals. The numbers of gene gains (+) and losses (-) are 
given on branches or to the right of the taxa. The rates of gene gain and loss for the clades 
derived from the MRCAR (MRCA of reptiles), MRCAT (MRCA of teleosts) and MRCAM 
(MRCA of mammals) and for Xenopus tropicalis are 0.001 1, 0.0012, 0.0017 and 0.0019 per 
gene per million years, respectively. 
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Figure 2. 

Identification of genes responsive to influenza A viruses in the lungs of ducks infected with 
one of two H5N1 viruses on days 1, 2 and 3 after inoculation. The genes included here 
showed significant differences in gene expression (FDR <0.001, fold change >2) in at least 
one experiment. Genes shown in red had upregulated expression, and those shown in yellow 
had downregulated expression in infected ducks relative to controls or in DK/49 -infected 
relative to GS/65-infected ducks. (Full gene names are given in supplementary table 13.) 
Hierarchical clusters of genes and samples were based on Pearson's correlation and 



Nat Genet. Author manuscript; available in PMC 2014 April 29. 



Huang et al. 



Page 17 



Spearman's rank correlation analyses, respectively, (a) Overall gene expression profiles in 
DK/49- or GS/65-infected ducks compared to control animals. The heatmap was generated 
from hierarchical cluster analyses of both genes and samples, (b) Expression of 1 19 innate 
immune genes in DK/49- or GS/65-infected ducks. The heatmap was generated from 
hierarchical analysis of genes, showing significant changes in gene expression for 119 innate 
immune genes in DK/49- or GS/65-infected ducks 1-3 d after inoculation, (c) Expression of 
two significantly expanded gene families (P-defensins and BTNLs) in DK/49- or GS/65- 
infected ducks. The heatmap was generated from hierarchical analysis of genes, showing 
that most of the avian defensin and BTNL genes, including two LSDs of AvDB3 and eight 
LSDs of BTNL genes, have significantly altered gene expression in DK/49- or GS/65- 
infected ducks 1-3 d after inoculation. 
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Table 2 

Comparison of cytokines in the duck, chicken, zebra finch, human and mouse genomes 



Number of genes 



Class 


Family 


Duck 


Chicken 


Zebra finch 


Human 


Mouse 


Class I cytokines 


IL-2 receptor 


7 


7 


7 


8 


8 




IL-3 receptor 


2 


2 


1 


3 


3 




IL-6 receptor 


8 


7 


8 


12 


12 




Single-chain family 


4 


3 


6 


6 


4 


Class II cytokines 


Type I interferons 


4 


6 


4 


20 


17 




Type II interferons 


1 


1 


1 


1 


1 




IL- 10 family 


4 


4 


4 


6 


5 


PDGF family 


Cysteine-knot growth factors 


8 


8 


8 


9 


9 




4-helix bundle growth factors 


0 


0 


0 


2 


2 




P-trefoil growth factors 


19 


19 


21 


22 


21 




Other growth factors 


15 


17 


18 


20 


20 


TNF family 




11 


11 


10 


18 


18 


IL-1 family 




2 


3 


2 


10 


9 


IL-17 family 




5 


5 


6 


6 


6 


TGF-P family 


BMP2 subfamily 


2 


2 


2 


2 


2 




BMP5 subfamily 


4 


4 


2 


5 


5 




GDF5 subfamily 


3 


2 


2 


3 


3 




VGL subfamily 


1 


1 


1 


2 


2 




BMP3 subfamily 


2 


2 


3 


2 


2 




ADMP 


2 


2 


2 


0 


0 




Intermediate members 


5 


4 


5 


5 


5 




Activin subfamily 


4 


2 


2 


4 


4 




TGF subfamily 


3 


3 


2 


3 


3 




Distant members 


6 


6 


6 


8 


8 


Chemokines 


CC chemokines 


13 


14 


12 


28 


27 




CXC chemokines 


8 


8 


8 


17 


14 




CXC3C chemokines 


2 


1 


2 


1 


1 




C chemokines 


1 


1 


1 


2 


1 


Total 




150 


149 


150 


230 


218 
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